Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Selecting representative samples plays an indispensable role in many machine learning and computer vision applications under limited resources (e.g., limited communication bandwidth and computational power). Determinantal Point Process (DPP) is a widely used method for selecting the most diverse representative samples that can summarize a dataset. However, its adaptability to different tasks remains an open challenge, as it is challenging for DPP to perform task-specific tuning. In contrast, Rate-Distortion (RD) theory provides a way to measure task-specific diversity. However, optimizing RD for a data selection problem remains challenging because the quantity that needs to be optimized is the index set of the selected samples. To tackle these challenges, we first draw an inherent relationship between DPP and RD theory. Our theoretical derivation paves the way for taking advantage of both RD and DPP for a task-specific data selection. To this end, we propose a novel method for task-specific data selection for multi-level classification tasks, named RD-DPP. Empirical studies on seven different datasets using five benchmark models demonstrate the effectiveness of the proposed RD-DPP method. Our method also outperforms recent strong competing methods, while exhibiting high generalizability to a variety of learning tasks.more » « less
-
Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).more » « less
-
Deep neural networks, including transformers and convolutional neural networks (CNNs), have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., quantification of diseases-related anomalous points in ECG and abnormal detection in signal). To address this challenge, we formally discuss and reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art MTSC methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC. The code is available https://github.com/xiwenc1/TimeMIL.more » « less
-
null (Ed.)Due to Wildfire's huge destructive impacts on agriculture and food production, wildlife habitat, climate, human life and ecosystem, timely discovery of fires enable swift response to fires before they go out of control, in order to minimize the resulting damage and impacts. One of the emerging technologies for fire monitoring is deploying Unmanned Aerial Vehicles, due to their high flexibility and maneuverability, less human risk, and on-demand high quality imaging capabilities. In order to realize a real-time system for fire detection and expansion analysis, fast and high-accuracy image-processing algorithms are required. Several studies have shown that deep learning methods can provide the most accurate response, however the training time can be prohibitively long, especially when using online learning for constant refinement of the developed model. Another challenge is the lack of large datasets for training a deep learning algorithm. In this respect, we propose to use a pretrained mobileNetV2 architecture to implement transfer learning, which requires a smaller dataset and reduces the computational complexity while not compromising the accuracy. In addition, we conduct an effective data augmentation pipeline to simulate some extreme scenarios, which could promise the robustness of our approach. The testing results illustrate that our method maintains a high identification accuracy in different situations - original dataset (99.7%), adding Gaussian blurred (95.3%), and additive Gaussian noise (99.3%).more » « less
An official website of the United States government

Full Text Available